Skip to content

Conversation

@hubgeter
Copy link
Contributor

What problem does this PR solve?

Related PR: #38432

Problem Summary:
in pr #38432 , if parquet reader use index to reade file and file column name not eq table column name, reader will modify
_colname_to_value_range . However, this object is held by multiple vfile scanners, and multi-threaded modification of this object will cause be core.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…o read files, there will be multiple threads modify same object
@Thearas
Copy link
Contributor

Thearas commented Apr 18, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

run buildall

@hubgeter
Copy link
Contributor Author

run buildall

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34952 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 22e44d416772f4cef6bcc0a30f33b9a08f08584b, data reload: false

------ Round 1 ----------------------------------
q1	25956	5082	5054	5054
q2	2068	277	197	197
q3	10482	1249	707	707
q4	10231	996	519	519
q5	8232	2407	2387	2387
q6	188	166	138	138
q7	938	744	616	616
q8	9321	1320	1169	1169
q9	6850	5036	5100	5036
q10	6810	2323	1884	1884
q11	491	273	263	263
q12	357	353	227	227
q13	17775	3716	3131	3131
q14	228	240	212	212
q15	565	500	497	497
q16	443	445	408	408
q17	601	865	351	351
q18	7944	7244	7197	7197
q19	1553	942	567	567
q20	328	334	227	227
q21	4240	3568	3194	3194
q22	1046	1011	971	971
Total cold run time: 116647 ms
Total hot run time: 34952 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5192	5122	5136	5122
q2	246	337	234	234
q3	2185	2669	2322	2322
q4	1457	1961	1509	1509
q5	4536	4429	4354	4354
q6	209	163	126	126
q7	1971	1873	1719	1719
q8	2639	2582	2520	2520
q9	7150	7080	7173	7080
q10	2980	3180	2721	2721
q11	574	504	499	499
q12	670	779	595	595
q13	3544	3918	3298	3298
q14	279	310	284	284
q15	539	495	498	495
q16	468	498	453	453
q17	1137	1583	1386	1386
q18	7829	7462	7504	7462
q19	825	831	951	831
q20	1996	2046	1859	1859
q21	5250	4810	4669	4669
q22	1065	1022	946	946
Total cold run time: 52741 ms
Total hot run time: 50484 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185155 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 22e44d416772f4cef6bcc0a30f33b9a08f08584b, data reload: false

query1	1032	485	492	485
query2	6559	1803	1803	1803
query3	6738	222	219	219
query4	26511	23215	23249	23215
query5	4335	600	450	450
query6	310	208	196	196
query7	4631	490	306	306
query8	324	240	245	240
query9	8638	2543	2528	2528
query10	482	311	266	266
query11	15208	15118	14841	14841
query12	161	113	112	112
query13	1652	526	422	422
query14	8700	6198	6170	6170
query15	202	227	173	173
query16	7285	600	491	491
query17	1187	690	558	558
query18	1960	404	293	293
query19	181	201	153	153
query20	125	120	116	116
query21	204	118	99	99
query22	4144	4104	4033	4033
query23	34326	33062	32993	32993
query24	8451	2371	2369	2369
query25	516	434	409	409
query26	892	268	144	144
query27	2740	490	320	320
query28	4331	2066	2048	2048
query29	721	548	435	435
query30	280	216	179	179
query31	916	841	744	744
query32	73	71	67	67
query33	557	386	314	314
query34	773	844	485	485
query35	805	809	736	736
query36	955	996	899	899
query37	113	104	79	79
query38	4118	4195	4145	4145
query39	1431	1396	1366	1366
query40	211	117	105	105
query41	57	53	51	51
query42	120	135	100	100
query43	498	496	486	486
query44	1301	791	814	791
query45	172	170	171	170
query46	835	1009	620	620
query47	1729	1789	1724	1724
query48	359	396	289	289
query49	740	498	408	408
query50	626	681	392	392
query51	4154	4173	4204	4173
query52	110	104	95	95
query53	233	252	189	189
query54	569	558	500	500
query55	81	79	82	79
query56	321	308	306	306
query57	1123	1168	1063	1063
query58	259	251	263	251
query59	2647	2651	2501	2501
query60	317	320	295	295
query61	131	126	128	126
query62	793	740	682	682
query63	240	186	186	186
query64	3638	1153	654	654
query65	4397	4197	4257	4197
query66	1010	425	320	320
query67	15425	15717	15183	15183
query68	7931	883	507	507
query69	479	301	266	266
query70	1184	1111	1060	1060
query71	461	327	282	282
query72	5587	4820	4796	4796
query73	691	646	339	339
query74	8802	9174	8569	8569
query75	3853	3236	2705	2705
query76	3607	1185	747	747
query77	780	359	281	281
query78	10030	10133	9301	9301
query79	1752	804	551	551
query80	576	508	439	439
query81	478	246	216	216
query82	198	132	96	96
query83	261	245	241	241
query84	246	99	81	81
query85	780	345	305	305
query86	348	316	297	297
query87	4325	4364	4376	4364
query88	3442	2160	2170	2160
query89	381	313	284	284
query90	2074	213	208	208
query91	137	142	111	111
query92	76	60	61	60
query93	1334	952	578	578
query94	621	415	291	291
query95	374	292	285	285
query96	474	556	268	268
query97	3093	3250	3123	3123
query98	242	210	207	207
query99	1329	1415	1300	1300
Total cold run time: 270609 ms
Total hot run time: 185155 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.99 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 22e44d416772f4cef6bcc0a30f33b9a08f08584b, data reload: false

query1	0.04	0.03	0.03
query2	0.12	0.11	0.11
query3	0.25	0.20	0.20
query4	1.60	0.20	0.19
query5	0.59	0.59	0.60
query6	1.17	0.72	0.73
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.58	0.51	0.52
query10	0.56	0.57	0.57
query11	0.16	0.11	0.10
query12	0.14	0.11	0.12
query13	0.60	0.60	0.61
query14	1.19	1.20	1.18
query15	0.88	0.86	0.83
query16	0.38	0.38	0.38
query17	1.04	1.03	1.07
query18	0.21	0.20	0.20
query19	1.81	1.79	1.73
query20	0.01	0.02	0.01
query21	15.41	0.90	0.57
query22	0.78	1.26	0.66
query23	14.85	1.38	0.62
query24	7.04	1.43	1.25
query25	0.51	0.24	0.06
query26	0.47	0.16	0.14
query27	0.05	0.04	0.04
query28	10.25	0.88	0.43
query29	12.70	4.00	3.33
query30	0.25	0.08	0.06
query31	2.82	0.58	0.38
query32	3.22	0.55	0.46
query33	3.00	3.05	3.07
query34	15.72	4.99	4.51
query35	4.53	4.55	4.48
query36	0.66	0.50	0.49
query37	0.08	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.02
query40	0.17	0.13	0.13
query41	0.08	0.03	0.03
query42	0.04	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 104.13 s
Total hot run time: 29.99 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 38.46% (5/13) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.16% (14428/27139)
Line Coverage 42.02% (124996/297464)
Region Coverage 40.85% (63876/156375)
Branch Coverage 35.49% (32127/90528)

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Apr 18, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit f61e55e into apache:master Apr 23, 2025
28 of 30 checks passed
hubgeter added a commit to hubgeter/doris that referenced this pull request Apr 28, 2025
…o read files, there will be multiple threads modify same object (apache#50161)

Related PR: apache#38432

Problem Summary:
in pr apache#38432 , if parquet reader use index to reade file and file column
name not eq table column name, reader will modify
_colname_to_value_range . However, this object is held by multiple vfile
scanners, and multi-threaded modification of this object will cause be
core.
morningman pushed a commit to hubgeter/doris that referenced this pull request Apr 29, 2025
…o read files, there will be multiple threads modify same object (apache#50161)

Related PR: apache#38432

Problem Summary:
in pr apache#38432 , if parquet reader use index to reade file and file column
name not eq table column name, reader will modify
_colname_to_value_range . However, this object is held by multiple vfile
scanners, and multi-threaded modification of this object will cause be
core.
dataroaring pushed a commit that referenced this pull request May 6, 2025
…o read files, there will be multiple threads modify same object. (#50161) (#50415)

bp #50161
morningman pushed a commit to hubgeter/doris that referenced this pull request May 6, 2025
…o read files, there will be multiple threads modify same object (apache#50161)

Related PR: apache#38432

Problem Summary:
in pr apache#38432 , if parquet reader use index to reade file and file column
name not eq table column name, reader will modify
_colname_to_value_range . However, this object is held by multiple vfile
scanners, and multi-threaded modification of this object will cause be
core.
yiguolei pushed a commit that referenced this pull request May 8, 2025
…o read files, there will be multiple threads modify same object (#50161) (#50496)

bp #50161
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…o read files, there will be multiple threads modify same object (apache#50161)

### What problem does this PR solve?
Related PR: apache#38432

Problem Summary:
in pr apache#38432 , if parquet reader use index to reade file and file column
name not eq table column name, reader will modify
_colname_to_value_range . However, this object is held by multiple vfile
scanners, and multi-threaded modification of this object will cause be
core.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.10-merged dev/3.0.6-merged p0_c reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants